What Kobe Bryant and Britney Spears Have in Common: Mining Wikipedia for Characteristics of Notable Individuals

نویسنده

  • Pauline Crystal Ng
چکیده

This paper proposes a statistical methodology for mining Wikipedia to discover characteristics associated with life outcomes. The methodology is demonstrated using first names and childhood environment. By comparing over 35,000 Wikipedia biographies against spatially and tempo rally matched census data, we show that individuals with rare names are twice as likely to appear in Wikipedia (RR 2.43 for females; RR 2.30 for males). This result is supported by past studies. Furthermore, birth location also plays a role in success: individuals born in New York and California are ~2x more likely to become entertainers, and those born in the South are ~1.5x more likely to become athletes. These results validate the proposed methodology of using Wikipedia to study life outcomes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

You are what you like! Information leakage through users' Interests

Suppose that a Facebook user, whose age is hidden or missing, likes Britney Spears. Can you guess his/her age? Knowing that most Britney fans are teenagers, it is fairly easy for humans to answer this question. Interests (or “likes”) of users is one of the highly-available on-line information. In this paper, we show how these seemingly harmless interests (e.g., music interests) can leak privacy...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

The Britney Spears Problem

This reprint is provided for personal and noncommercial use. For any other use, please send a request Brian Hayes by electronic mail to [email protected].

متن کامل

Computational Learning Models For Drifting Concepts

In this paper, we explore computational learning models for learning concepts classes where the target concept changes with time, a scenario known in the literature as concept drift. The importance of this topic is immediately evident in the dynamic nature of real-world concepts; if a phenomenon can be represented by a static concept, then it often does not even require learning. The most obvio...

متن کامل

Employing data mining to explore association rules in drug addicts

Drug addiction is a major social, economic, and hygienic challenge that impacts on all the community and needs serious threat. Available treatments are successful only in short-term unless underlying reasons making individuals prone to the phenomenon are not investigated. Nowadays, there are some treatment centers which have comprehensive information about addicted people. Therefore, given the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012